Skip to content

[data] feat: add support for swe rebench v2#56

Merged
yyDing1 merged 6 commits into
mainfrom
swe-rebench-v2
Jun 7, 2026
Merged

[data] feat: add support for swe rebench v2#56
yyDing1 merged 6 commits into
mainfrom
swe-rebench-v2

Conversation

@yyDing1

@yyDing1 yyDing1 commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

as title

Checklist Before Starting

  • Search for similar PRs or issues and paste at least one relevant link here: ...
  • Format the PR title as [{modules}] {type}: {description} (checked by CI)
    • {modules} may include core, interaction, model, env, tools, deployment, reward, dashboard, docs, examples, data, train, ci, build, deps, misc
    • If this PR involves multiple modules, separate them with , like [interaction, tools, docs]
    • {type} must be one of feat, fix, refactor, chore, test
    • If this PR breaks an API, config contract, workflow, or other compatibility boundary, add [BREAKING] to the beginning of the title
    • For a stacked PR series, you may prepend a progress marker such as [1/N]
    • Example: [BREAKING][deployment, docs] feat: simplify runtime env configuration

Test

List the checks you ran. If CI coverage is not practical for this change, describe the manual validation or experiment results.

API and Usage Example

Show any public interface changes or updated usage examples if relevant.

# Add a short example here when the PR changes public behavior

Design & Code Changes

Summarize the approach for non-trivial changes and call out important implementation details or trade-offs.

Checklist Before Submitting

  • Read the Contribute Guide
  • Run pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
  • Add or update docs/examples for user-facing changes
  • Add tests or explain why tests are not practical
  • Confirm the PR title matches the required format
  • Confirm the placeholder text in this template has been replaced with real content

@yyDing1 yyDing1 merged commit 5b7a475 into main Jun 7, 2026
3 checks passed
@yyDing1 yyDing1 deleted the swe-rebench-v2 branch June 7, 2026 17:17

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the nebius/SWE-rebench-V2 dataset (Python slice), including a preprocessing script, a dedicated reward spec, and test-log parsers. Feedback on these changes highlights several robustness issues: a potential NameError in the preprocessing script when no languages are specified, potential KeyError and IndexError exceptions when parsing metadata, and a potential AttributeError if a patch is not a string. Additionally, immediately stripping whitespace from command observations in env.py could strip crucial formatting like indentation or tracebacks, so it is recommended to only strip during the empty check.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +189 to +194
if languages:
wanted = {lang.lower() for lang in languages}
dataset = dataset.filter(lambda ex: (ex.get("language") or "").lower() in wanted)
dataset = dataset.filter(lambda ex: ex["instance_id"] not in SKIP_SAMPLES)

print(f"Kept {len(dataset)} instances after language filter {sorted(wanted)}", flush=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If languages is falsy or None, the wanted variable is never defined. This will raise a NameError: name 'wanted' is not defined when printing the log on line 194. Defining wanted beforehand or conditionally formatting the log message resolves this issue.

    wanted = None
    if languages:
        wanted = {lang.lower() for lang in languages}
        dataset = dataset.filter(lambda ex: (ex.get("language") or "").lower() in wanted)
    dataset = dataset.filter(lambda ex: ex["instance_id"] not in SKIP_SAMPLES)

    filter_desc = sorted(wanted) if wanted else "None"
    print(f"Kept {len(dataset)} instances after language filter {filter_desc}", flush=True)

Comment on lines +141 to +142
"log_parser": install_config["log_parser"],
"test_cmd": install_config["test_cmd"],

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If example["install_config"] is None, install_config defaults to an empty dictionary {}. Accessing install_config["log_parser"] and install_config["test_cmd"] directly will then raise a KeyError. Using .get() is safer.

Suggested change
"log_parser": install_config["log_parser"],
"test_cmd": install_config["test_cmd"],
"log_parser": install_config.get("log_parser"),
"test_cmd": install_config.get("test_cmd"),

Comment on lines +99 to +100
# Upstream clones into /{repo.split('/')[1]} (see combine.Dockerfile.j2).
self.repo_dir = f"/{self.repo.split('/')[1]}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If self.repo is not in the expected owner/repo format (e.g., if it is just a single repository name), self.repo.split('/') will only have one element, causing an IndexError when accessing index 1. Adding a defensive check prevents potential crashes.

Suggested change
# Upstream clones into /{repo.split('/')[1]} (see combine.Dockerfile.j2).
self.repo_dir = f"/{self.repo.split('/')[1]}"
# Upstream clones into /{repo.split('/')[1]} (see combine.Dockerfile.j2).
repo_parts = self.repo.split("/")
self.repo_dir = f"/{repo_parts[1]}" if len(repo_parts) > 1 else f"/{self.repo}"

Comment on lines +53 to +61
def _get_modified_files(patch: str) -> list[str]:
"""Target (``b/``) paths touched by a unified diff, order-preserved & deduped.

A tiny self-contained replacement for ``swebench.harness.utils.get_modified_files``
(only used to reset the test files before/after applying the test patch).
Files deleted by the patch (``+++ /dev/null``) are skipped.
"""
files: list[str] = []
for line in patch.split("\n"):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If patch is None or not a string, calling patch.split("\n") will raise an AttributeError. Adding a defensive type check ensures the function handles unexpected or empty inputs gracefully.

Suggested change
def _get_modified_files(patch: str) -> list[str]:
"""Target (``b/``) paths touched by a unified diff, order-preserved & deduped.
A tiny self-contained replacement for ``swebench.harness.utils.get_modified_files``
(only used to reset the test files before/after applying the test patch).
Files deleted by the patch (``+++ /dev/null``) are skipped.
"""
files: list[str] = []
for line in patch.split("\n"):
def _get_modified_files(patch: str) -> list[str]:
"""Target (b/) paths touched by a unified diff, order-preserved & deduped.
A tiny self-contained replacement for swebench.harness.utils.get_modified_files
(only used to reset the test files before/after applying the test patch).
Files deleted by the patch (+++ /dev/null) are skipped.
"""
if not patch or not isinstance(patch, str):
return []
files: list[str] = []
for line in patch.split("\n"):

Comment on lines +162 to +163
observation = re.sub(r"\x1b\[[0-9;]*m|\r", "", observation).strip()
if observation == "":

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling .strip() immediately on the observation modifies the returned command output by removing all leading and trailing whitespace/newlines. This can affect formatting (e.g., indentation in file content or traceback outputs) when presented to the agent. It is safer to preserve the original whitespace and only strip it for the empty check.

Suggested change
observation = re.sub(r"\x1b\[[0-9;]*m|\r", "", observation).strip()
if observation == "":
observation = re.sub(r"\x1b\[[0-9;]*m|\r", "", observation)
if observation.strip() == "":

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant